Language Identification in Document Images
نویسندگان
چکیده
منابع مشابه
Language Identification in Document Images
This paper presents a system dedicated to automatic language identification of text regions in heterogeneous and complex documents. This system is able to process documents with mixed printed and handwritten text and various layouts. To handle such a problem, we propose a system that performs the following sub-tasks: writing type identification (printed/handwritten), script identification and l...
متن کاملScript and Language Identification in Degraded and Distorted Document Images
This paper reports a statistical identification technique that differentiates scripts and languages in degraded and distorted document images. We identify scripts and languages through document vectorization, which transforms each document image into an electronic document vector that characterizes the shape and frequency of the contained character and word images. We first identify scripts bas...
متن کاملLanguage identification in Complex, Unoriented, and Degraded Document Images
We describe algorithms for identifying the language of text in document images which are complex, unoriented, and degraded. We distinguish among seven lan-page layouts may be complex, containing text blocks in unknown roughly Manhat-tan arrangements. The pages may be unoriented, that is, upright or rotated by 90, 180, or 270 degrees. The images may be degraded by digitization at coarse and uneq...
متن کاملLanguage Identification in Degraded and Distorted Document Images
This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, char...
متن کاملTechniques for Language Identification for Hybrid Arabic-English Document Images
Because of the different characteristics of Arabic language and Romance and Anglo Saxon languages, recognition of documents written in hybrid of these languages requires that the language of the text to be identified priori to the recognition phase. In this paper, three efficient techniques that can be used to discriminate between text written in Arabic script and text written in English script...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronic Imaging
سال: 2016
ISSN: 2470-1173
DOI: 10.2352/issn.2470-1173.2016.17.drr-058